On extending VTLN to phoneme-specific warping in automatic speech recognition
نویسندگان
چکیده
Phonemeand formant-specific warping has been shown to decrease formant and cepstral mismatch. These findings have not yet been fully implemented in speech recognition. This paper discusses a few reasons how this can be. A small experimental study is also included where phoneme-independent warping is extended towards phoneme-specific warping. The results of this investigation did not show a significant decrease in error rate during recognition. This is also in line with earlier experiments of methods discussed in the paper.
منابع مشابه
On the Use of a Wave-Reflection Model for the Estimation of Spectral Effects due to Vocal Tract Length Changes with Application to Automatic Speech Recognition
Vocal tract length normalization (VTLN) is commonly used in state-of-the-art automatic speech recognition (ASR) systems to reduce the mismatch between speaker-dependent formant frequency scalings. Usually, the normalization is done by a piece-wise linear scaling of the filter bank center frequencies. The linear scaling is motivated by a uniform acoustic tube model that does not take any loss ef...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملImpact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...
متن کاملVocal tract length compensation in the signal and model domains in child speech recognition
In a newly started project, KOBRA, we study methods to reduce the required amount of training data for speech recognition by combining the conventional data-driven training approach with available partial knowledge on speech production, implemented as transformation functions in the acoustic, articulatory and speaker characteristic domains. Initially, we investigate one well-known dependence, t...
متن کاملA Study on Combining VTLN and SAT to Improve the Performance of Automatic Speech Recognition
In this paper, we present ideas to combine VTLN and SAT to improve the performance of automatic speech recognition. We show that VTLN matrices can be used as SAT transformation matrices in recognition, though the training still follows conventional SAT. This will be useful when there is very little adaptation data and the SAT transformation matrix can not be estimated to perform the required ad...
متن کامل